Hello everyone, good afternoon. My name is Alexandre Borges. I'm a malware and security
researcher at BlackStorm Security. Let's talk about malware, the obfuscation and
emulation. It's a very short presentation, only 190 slides, so let's go. It's my
profile and that's my agenda for today. We will see something about anti-reversing,
metasm framework, we will see something about Miasm, Triton, Radar, we touch
something about gtracing on Windows, and finally a very interesting trick using
as an anti-virtual machine detection.
Let's start. I've been analyzing malware for many years. Honestly, most of them use
these speckers, speck, armadillo, petit and so on. It's okay. We know how to unpack them.
It's so easy. Most of them are pretty easy to unpack.
We also know how to do a lot of things. We know how to do a lot of things. We know how
to do a lot of things. We also know the main memory APIs, like these ones in red. We also
know how to use a debugger. We know how to set up some break points. We know how
to dump, unpack binary from memory. How to dump inject code from memory. We also know
We can use very special tools like PCB for hasherzad, it's an excellent tool.
We can use volatility to dump, inject code and unpack the code from the memory and fix.
It's in part understandable by using, for example, a very special plugin named GimpScan.
It's so easy to do that.
We can also write very short scripts for debuggers.
For example, in this case, I wrote a very short script to dump unpacked injected code from memory in x64 debugger.
It's so easy to do that.
I tried to comment line by line to help you later.
We have one, two, and three slides containing these scripts.
It's so easy to do that.
However, the real world is a bit different.
In the real world, things are different.
For example, yes, I can write some plugins in AIDA Pro.
It's simple.
It's so common to do that.
I can write new loaders, for example, to load some node binary like, for example, MBR file.
But modern packers have been using different tricks, different anti-reverse tricks to make our lives harder.
Some packers, such as Vue,
VMProtect, Temida, Arxan, and Agile have been using different tricks, hard tricks to bypass.
And our goal here is try to explain what can I do to circumvent some of these tricks.
Most of these packers have been focused on 64 code.
They have been using so different tricks to do that.
For example, these packers, these advanced packers, remove the imported tables, of course.
They try to encrypt every single string inside the code.
They try to protect the memory using a kind of memory chip.
So, in this case, it's almost impossible to dump the code from memory because the code is encrypted on the memory.
In this case, it's impossible to modify or dump the whole code from memory.
These packers use, for example, very interesting tricks when protecting .NET code.
For example, rename classes, methods, fields, and so on.
But much worse, these packers virtualize Intel instructions to a very special kind of instructions.
I mean, I have an Intel instruction being virtualized to a virtual machine environment
using Spark.
Using Spark sometimes, using a RISC code.
In this case, it's so complicated to bypass.
Additional tricks are used by these advanced packers.
All instructions are encrypted, unfortunately.
Most of the obfuscation is stack-based.
So, it's so hard to...
...handle statically.
The virtualized code is polymorphic.
In this case, for example, one Intel instruction can be mapped to several different virtualized instructions.
We have lots of fake push instructions.
We have many dead codes.
We have code reordering.
In this case, it's a kind of spaghetti code.
We have a very special trick named code flattening.
I will speak about that later.
And we have tons of anti-debugger and anti-virtual machine techniques.
We don't know.
Nothing about the internals of these virtual machines.
These protectors have been using very special and private virtual machines
to translate the Intel code to a virtualized code.
We don't know anything about that.
However, we know that...
...the general idea is always the same.
Instruction must be factored, decoded.
We have to find the pointer to the handlers.
And finally, we have to execute each one of these handlers.
That's the general idea.
I have an instruction.
This instruction is decoded.
And once the instruction is decoded,
the dispatcher picks up one of these handlers to process the instruction.
In the real world, it's a bit more complicated.
Because the virtualized instructions are organized into an array.
In this case, you won't see so many opcodes in the virtualized code.
You will see some indexes.
Indexes point to the encryption.
Instruction.
Once the instruction is decoded,
we have the opcode point to the function pointer.
And finally, from the function pointer,
we have the handler being executed.
Additionally, we have other tricks,
other anti-reverse techniques being used.
For example, code unfolding.
In this case, these advanced packers take one constant
and turn it into several constants.
We have pattern-based obfuscation.
I will talk about that later.
We have so many inline functions inside the code.
We have tons of anti-virtual machine techniques.
We have tons of garbage codes.
And of course, we have many, many, many duplicated codes.
Furthermore, we have other tricks being used.
In this case, for example,
we have some contouring direction.
There are some very interesting tricks being used here.
For example, using the return instruction to skip codes.
Skip that code.
These packers use exceptions.
These packers also use opaque predicates.
Opaque predicates is a very known trick.
It's basically a back-to-back jump and jump and z-instruction.
Apparently, you have a conditional jump,
but indeed, the same instruction is taken,
the same branch is taken,
and the other one is never taken.
I will talk about that later, too.
I have a very educational example.
I know that we don't have enough time to see and learn
how to write IDAPA plugins here,
but I left some procedures step-by-step for helping later
about how to set up your environment
to write IDAPA plugins using Visual Studio.
In this case, we have one.
It's true in three slides explaining how to set up your environment
to write IDAPA plugin.
I've been using IDAPA plugin to handle these complex and reverse cases.
Of course, we don't have enough time here, unfortunately.
But I wrote a very simple IDAPA plugin.
This plugin is so basic.
I tried to comment line by line to help you.
And this plugin aims to find web links inside a binary.
So I wrote this plugin here.
I tried to comment every single line,
every single important instruction.
And I executed it.
As you see, it works.
I'm able to find, for example, web links inside a binary.
You can also use processor models as an extension of plugins in IDAPRO.
It's so hard to explain that now,
but I left some very basic explanation here for help.
But returning to our problem,
imagine this program.
It's a very, very, very simple program
written in C language.
C is quite easy.
And this program has a linear execution.
I mean, the execution is straight.
Only these blocks.
Of course, obviously,
we don't have any kind of obfuscation here.
We don't have obfuscation here.
When we disassemble it in the IDAPRO,
the code is so simple, too.
But my idea is using a very old trick,
an anti-versity trick,
named code flattening.
When I apply code flattening techniques,
on this code,
I transform this code
from linear execution
to multi-branch execution.
You know,
it's the same code,
but in this case,
we have a linear execution,
and in this case,
we have a multi-branch execution.
When I open this code
in the IDAPRO,
I will see
a multi-branch disassembly.
You can try
and make some tests
using code flattening.
For example,
I use the obfuscator WLVM
to do that.
I left the complete procedure
about how to install it.
At the bottom,
I show you
how to run
this kind of obfuscator.
And, as you see,
we got the same code
obfuscated.
Pay attention.
Look at that.
The same code,
but this time,
we have a multi-branch execution.
Obfuscation is
so used
in the industry.
It's so hard
to circumvent it,
but we have some techniques
to do that.
This is
an overview.
And this is
the decompiled code.
Look at that.
Originally,
we had only one
while statement.
Now,
we have one while statement,
three if statements,
and one while statement.
This is
a very short example about
opaque predicates.
Take a look.
We have
two jump conditionals,
jump z,
and jump energy
instructions
back to back
together.
However,
the exact instructions above
is establishing that
only the jump zero
is taken.
The another jump
is never taken.
It's a very low trick
when you are handling
anti-reversing tricks.
This is a shellcode,
a very simple shellcode.
There,
at the top,
in red,
we have
the decryption
routine.
Very simple.
And once I wrote
a very simple
IDAPRO script,
I decoded
this
tactically,
and you see
the decrypted
shellcode
in light
light.
It's blue
below.
Another anti-reversing trick
so used
is
call stack manipulation.
Apparently,
it's a very
easy code to read.
However,
pay attention.
The ratting instruction,
the return
instruction
is not true.
It's not
a true
return.
This
return
instruction
is
skipping
these
blue
instructions
and the
true
return
instruction
is
this one.
This is
a
very
used
technique
by
this kind
of
advanced
protectors.
So,
as I promised,
I will
show some
tricks
to handle
different
kinds
of
obfuscation.
Let's
take a
very
educational
sample.
Imagine
this
scenario.
I
have
just
one
instruction
in
green
at
EAX
ECX.
Once
this instruction
is
obfuscated,
we have
the second
stage in
yellow.
Obfuscate
again,
we have
the third
stage
in orange.
Finally,
we have
the last
stage
four
in red.
The
question
is,
how
can I
reverse
the process?
How
can I
go
from
the
stage
four
and find
that
all of
these
instructions
are equal
to
just
one
instruction
at
EAX
ECX?
First,
I
will use
a
very
interesting
framework,
Metasme
is an
amazing
framework.
Metasme
supports
different
platforms.
And
I show
you how
to install
Metasme
at bottom,
step by
step.
OK?
I test it.
And
let's
go.
Take
a
look.
I
pick
up
all
of
these
instructions
in the
stage
four.
Sorry.
And I
start
here.
My
choice
is
using
32
bits.
And I
try
comments
line
by line,
block
by block,
block
by block
to help
you later.
Basically,
the key
points here
are the
yellow line
because I
initialize
in the
backtracking
engine.
And
I will
try
to
solve
our
problem
symbolically
using
opcodes.
I'm
logging
each
executed
instruction.
And
finally,
I'm
showing
you
only
the
effective
instructions
here.
I
run
our
code.
Our
code
was
written
in
Ruby.
There we
have our
initial
opcodes
code.
And
if you
take a
look,
we
have
here
instructions
being
executed
one
by
one
containing
all
the
registered
information
and flags
one by
one.
And
finally,
we have
EAX
at
top
is
equal
EAX
plus
ECX.
You
see
it's
possible
to
de-
obfuscate
all
or
obfuscate
code
and find
that
the bunch
of instructions,
the bunch
of
obfuscation
instructions
are equal
EAX
to ECX.
So easy.
The effective
instructions
are those
ones.
Additionally,
I
try to
use
emulation
using
very nice
combination
between
the
Keystone
and
UEMU.
UEMU is
a
.
Once again,
I show you
how to
install
Keystone
here.
And
this
time,
my
choice
was
write
a
C
language
program.
I
try to
comment line
by line
again for
help.
If you
take a
look,
the
keyline
is the
second
one and
the third
one.
In the
second one,
I'm
creating a
Keystone
engine.
In this
case,
Keystone
is an
assembler.
So,
I'm
assembling
the
instructions
in
opcode
to
x
decimal.
I
executed
and
at
the
middle,
I
got
the
x
decimal
equivalent
to
our
opcodes.
I
save
it
in
a
program
I
was
right.
I
make
another
program
using
this
time
Capstone.
Capstone
make
the
inverse
process.
This
time,
I'm
disassembling.
I
inserted
our
output
from
last
slide.
I
was
right.
I
got
the same
inserted
codes.
But
returning
to
our
problem,
I
open
our
file
defcon
2009.bin
in
the
IDA
Pro.
It
opened
perfectly.
And
I
used
the
I
set up
the
EAX
to
3.
I
set up
ECX
to
6.
And
I
emulated
using
UMU.
Finally,
I found
EAX
is
equal
EAX
to
ECX
9.
In
this
case,
I
solved
numerically.
I
also
used
the
Unicorn.
Unicorn
is
an
emulator.
Once
again,
I
wrote
a
very
simple
program
in C
language.
I
inserted
our
X
decimal
here.
And
I
created
a
macro
named
Defcon
Code.
I
tried
to
comment
this
program
line
by
line
or
block
by
block.
My
comments
are
in
blue,
light
blue.
Here,
we
have
two
key
points.
First,
I set
up
I set
the
EAX
to
4.
I
set
ECX
to
7.
And
I
set
stack
size.
I
start the
emulation
here in
the second
line.
Sorry.
I started
the emulation
here in
the first
line in
yellow.
Once
more,
I
executed
our
program.
Our
initial
values
are
EAX
4,
ECX
7.
Line
by
line,
instruction
by
instruction
are
executed.
And
our
final
result,
EAX
is
equal
EAX
to
ECX
B11
in
decimal.
I've
also
used
Miasmi.
Miasmi
is
another
amazing
framework
that
works
so
well
in
different
platforms.
Once
again,
I show
you how
to
install
Miasmi
step
by
step
to
help
you.
I
test
Miasmi.
It
generates
a
very
nice
graph.
But
I'm
you
how
works.
So
let's
works.
let's
see
how
it
works.
break
point
here.
Once
I
run,
we
have
our
code
disassembled.
Each
instructions
are
executed
line
by
line.
And
finally,
I
have
our
numerical
result
here.
This is
Additionally,
I
also
have
our
program
use
symbolic
execution.
Almost
the
same.
However,
the
only
change
is
at
bottom.
I
use
a
symbolic
execution
engine
here.
I
execute
our
program
once
more.
Instruction
by
instruction.
And
finally,
I
have
the
EAX
is
equal
to
EAX
initial
plus
ECX
initial.
As
you see,
I'm
able
to
a
very
simple
code by
using
different
platforms,
by
using
different
frameworks.
I've
also
used
Triton.
Triton
has
another
very
interesting
platform.
In
this
case,
Triton
supports
X86
and
X64
architectures.
Triton
supports
symbolic
execution.
In
this
case,
I'm
able
to
emulate
only
part
of
my
program.
But
I
can
use
the
concrete
execution
to
analyze
the whole
program.
Once
again,
I'll
show you
how to
install
Triton.
In
this case,
without
using
PIN
from
Intel.
In
the
next
slide,
using
PIN
from
Intel.
Step
by
step,
it's
working.
I
wrote a
very
simple
program
in
Python.
I
start
our
code
here
using
X
decimal.
I
try
to
comment
again
line
by
line,
block
by
block
in
blue
for help
you.
And
oh,
here is
very
interesting
because
some people
don't
know
how to
convert
from
opcode
to
Xcode.
So I
show you
how to
do that
using
a
RASM
from
RADAR.
Of
course,
you can
use
either
PRO,
GIDRA,
and so
on to
do
that,
but I
show you
how to
do
that
using
RADAR,
RASM.
So I
executed
our
Python
program
using
Triton,
and
I
was able
to
symbolically
solve
our
problem.
Once again,
all
instructions
are shown.
And
finally,
I
know
that
all
the
opcode
is
equivalent
to
a
simple
add
operation.
I
try to
solve
it
using
a
numerical
approach
using
the
same
Triton
framework.
Sorry.
I
wrote
another
program
in
Python.
I
inserted
the
same
code
there,
the
same
opcode
code
in
I set
up
the
initial
value
of
each
register,
ESP,
EBP,
EAX,
BX,
CX,
DX.
I
set
up
the
entry
point
address,
and
I
started
the
processing.
Each
instruction
is
executed
one by
one.
And finally,
at the
bottom,
in
yellow,
we
have
the
answer.
We
can
use
hardware
to
handle
our
problem.
I
start
hardware
using
32
bits.
I
enabled
the
program.
As you
can
see,
there are
many
ECU
commands
here.
I
set
the
EAX
to
7,
ECX
to
2.
I
set
the
break
point
here.
And
I
run
our
program.
As you
see,
we
have
the
EAX
is
equal
EAX
to
ECX
9.
We
can
integrate
hardware
to
miasma.
In
this
case,
the
miasma
is
the
working
gene.
And
it's
quite easy
to do
that.
I show
it here
step by
step.
And
I
run
hardware
using
miasma
integrated.
As
you
see,
we
can
see
all
ECU
commands
here
coming
from
miasma
but
translated
to
ECU
commands
in
hardware.
The
trace
is
an
outstanding
tool.
The
trace
was
introduced
in
Solaris
10,
15 years
ago.
It's
an
amazing
dynamic
tracing
tool.
And
recently
the
trace
was
supported
through
Windows
two
months
ago.
Honestly,
the
trace
is
a
set
of
probes.
Basically,
probes
are
scattered
on
the
kernel.
And
each
time
that
a
probe
node,
it
will
very
unusual
language,
D
language.
And
the
general
composition,
the
general
format
name.
Provider
is
the
library.
Model
is
the
kernel
model.
Function
is
the
system
call.
And
name
is
the
name
of
the
probe.
I
show
you
how
to
install
the
trace
on
Windows
10.
It's
quite easy
to do
that.
And
here,
I show
some
commands
using
D
trace.
I list
all the
probes
using
D
trace
minus
L.
I
list
all
the
probes
related
to
system
calls.
In
this
case,
read
system
calls,
write
system
calls,
and
view
system
calls.
It's
quite easy
to do
that.
Here's
a very
short
example
about
D
trace.
In
this
case,
I'm
counting
the
number
of
times
that
a
system
call
is
called
when
I'm
using
Notepad,
for
example.
Another
very
interesting
example,
I
run
this
very
simple
line,
very
simple
command
using
D
trace,
and I
could
get
the
running
process
on my
machine.
This is
a
bit
more
complicated
sample.
In
this
case,
I
the
Chrome
during
only
five
seconds.
Another
example
here,
I
will
list
the
number
of
times
that a
system
call
is
called
in
my
whole
machine.
The
trace
has a
very
interesting
provider
named
Function
Boundary
Tracing.
Using
Function
Boundary
Tracing,
I'm
able to
trace
the
system
calls
being
called
in
the
kernel
end.
To do
that, I
need to
attach a
kernel
debugger
to
Windows 10,
and
this
fact
enable
Function
Boundary
Tracing.
For
example,
in
this
case,
I
can
use
Win
the
Bag
to
attach
to
Windows 10
and enable
Function
Boundary
Tracing.
Look at
that.
I
can
trace
all
system
calls
related
to
NTFS
for
the
WinHard
program.
It's
quite
easy.
I've
been
using
that
to
analyze
some
numbers.
Of
course,
unfortunately,
the
trace is
so new
on
Windows,
and
some
problems
happen.
In
my
case,
my
system
crashed,
and
I
provide
you
a
very
short
demo
the
main
model
of
the
trace.
I
show
you
step
by
step
how
I
can
investigate
the
problem.
And
finally,
I
found
that
the
Gilt
problem,
the
one
in
green.
I didn't
have
enough
time
to
notify
the
Microsoft,
but
that's
it.
Finally,
a
few
months
ago,
I saw
a very
strange,
very
strange
anti-virtual
machine
technique
being
used
by
an
advanced
protector.
Usually,
most
mirrors,
most
mirror
samples
have
been
used
different
anti-virtual
machine
techniques
to
detect
virtual
machines
such
as
virtual
to
write
a C-sharp
program
to detect
a virtual
machine.
It's
quite
easy.
You
can use,
for
example,
this class,
Win32
BIOS
Manager
class,
to do
that.
It's
quite
easy.
And I show
you here
a very,
very,
very
simple
code to
detect
virtual
machines.
I run
this code.
I
comment
some lines
there at
top.
I comment
the
user
functions.
And at
bottom,
as you
see,
I run
my program
in a
physical
host in
a guest
virtual
machine.
As you
see,
it's
quite easy
to know
that at
the right
side,
I'm
using
a virtual
machine.
However,
it's
not the
problem.
It's
not the
question.
It's
not the
issue.
I
saw
a very
strange
technique
using
temperature.
I
thought,
whoa,
how can I
use
temperature
to detect
virtual
machines?
I
try to
reproduce
this
technique
using
C
sharp.
I
wrote
a very
short
of
an
error at
bottom.
No
reference
exception.
And I
try to
investigate
what
happened.
I
used
the
Windows
Management
Instrumentation
Tester to
do that
at the
bottom.
One,
two,
three,
and four.
And I
found
that in
a virtual
machine,
I don't
have
temperature
probes.
Windows
does not
offer me
any kind
of
temperature
probe.
I
highlighted
it at
the middle
of the
picture.
So,
it was
easy.
I
translated
this
fact
to
my
program.
And I
added
an
exception
handler
to handle
these
strange
situations.
Finally,
my
program
was able
to detect
using
temperature
to
detect
physical
holes
or
a
virtual
machine.
I've
been using
several
techniques
to
circumvent
this
trick.
But I
believe that
you'll
see
this
technique
more
often.
Finally,
my
conclusions.
People
have
asked
me
how
to
bypass
anti-reverse
techniques.
First,
you need
to know
all
the
techniques
being
used.
After,
you try
to use
different
frameworks
such as
miasma,
metasma,
triton
and so
on.
Emulation
is a
good
alternative,
of
course.
And
detrace
is
a
new
old
tool
that
recently
was
ported
to
Windows
and I
have
been
using
it to
solve
some
reversing
problems.
I
would
like
to
thank
you,
DEF CON
staff,
for
attending my
talk.
Have a
nice
day.
Any
questions?
Questions?
Please.
Sorry.
I'm
just
wondering
how
do
you
deal
with
those
programs
that
has
been
obfuscated
by
Timida?
Sorry?
By
Timida.
Like
being
protected.
I mean,
the
virtual
machine-based
malware.
It's
like
Timida.
There are
different
tricks to
prevent
my
virtual
machine
of
being
detected
by
a
malware
sample.
Different
tricks.
Usually
I
try to
set up
a
very
specific
virtual
machine.
And
I
modify
the
VMX
file
to
prevent
being
detected.
I mean,
my question
is what
if it
has been
protected
by
VM-based?
It's
like,
for
instance,
like
Timida.
Timida?
Yeah,
Timida.
It's
like,
it uses
VM itself.
Yes,
I use
VM,
yes.
It uses
VM.
I mean,
Timida,
the
malware
itself
uses
VM,
just like
a
Java program.
And you
use
just
in time
to,
just in
time
technique
to
compile
it
into
bytecode
and
execute
one
line
and
then
discard
that
bytecode
and
then
execute
the
next
line.
I mean,
just,
so it's
like,
basically,
this kind
of
technique
has been
used
in
Timida.
Myself,
I was
trying to
reverse
that,
but I,
but not
very
successful.
So,
I was
wondering
what's
your
take
on
that
thing?
I have
so many
tricks
to do
that.
I can't
talk about
that.
I can't
explain
every
single
point
because
it's
a very
long
topic.
Very,
very long
topic.
It's
very complicated
and I
don't know
if there
is any
known
approach.
So,
I was
just
wondering
if
the
emulation
approach
that is
suggested
by you
could work
for that?
Better,
better.
I think emulation
is a nice
approach.
And it's
metamorphized
and sometimes
you use
the high
level
semantic
approach
to,
I mean,
because it's
not neutral
matching
byte
by byte.
It's
a little bit
high level,
high level
semantics.
What I
mean is
it's
basically
just
metamorphism.
And it's
metamorphism.
It's
not just
simple
polymorphism.
It's
metamorphism.
The code
itself just
changes
it every
time.
And the
VN
itself has
been
changed.
Every
time
several
times
in a
virtual machine
for example.
And I
try to
find a
pattern.
And I
try to
make a
relation
between
this
pattern.
So I
try to
make a
table,
a kind
of
mapping
table
to
try
to
understand
better
how
the
virtual
machine
from
the
media
works.
I've
been
done
the same
approach
in
using
other
protectors
such as
VN
Protect,
Archon,
Agile,
and so
on.
So the
approach
is
almost
the
same.
I try to
find,
for example,
I try to
split
my
instructions
in classes,
in different
classes.
For example,
jump instructions,
conditional
instructions,
and so
on.
I try to
map
some
Intel
instructions
to a
virtualized
instruction.
And I
make a
try to
make a
relationship
between
the
Intel
instruction
to a
virtualized
instruction.
However,
most
people
believe
that all
Intel
instructions
are
virtualized.
It's not true.
Only a
few of
them.
So I
try to
understand
what
instructions
were
virtualized
.
I think
everything has
been virtualized
except for
virtual machine
itself.
I mean,
virtual machine
itself has
been,
they could
run several
levels of
virtual machine.
Yes,
yes.
I'm just
wondering,
I mean,
to make
it quick,
how,
what's
your
successful
rate?
I mean,
have you
been
successful
to
successfully
reverse
every
of them
or just
sometimes
you just
also
have
some
difficulties?
Most
time
I have
problems
to
handle.
I fail
most
of
the
time.
Yes,
but
most
time
I have
been
I've
been
had
some
success
about
50%.
Yes.
Honestly,
I would
like
to
have
100%.
I hope
so.
Thank you
very
much.
Any
further
questions?
No
more
questions?
Thank
you
so
much
for
